Skip to content

Reduce memory usage during collection deletion#2492

Open
Funi1234 wants to merge 4 commits intopulp:mainfrom
Funi1234:fix/AAP-53296
Open

Reduce memory usage during collection deletion#2492
Funi1234 wants to merge 4 commits intopulp:mainfrom
Funi1234:fix/AAP-53296

Conversation

@Funi1234
Copy link
Copy Markdown

@Funi1234 Funi1234 commented Apr 2, 2026

Reduce memory usage during collection deletion

Problem

Deleting collections with many versions causes worker processes to consume excessive memory and get killed with SIGKILL. This occurs because Django loads all fields (including large JSON fields) for each CollectionVersion into memory. When multiplied across many versions, this causes the worker process to be terminated.

Solution

This PR applies targeted QuerySet field selection using .only() to load only the fields needed for deletion operations, avoiding the large JSON fields that aren't required.

Changes:

  • pulp_ansible/app/galaxy/v3/views.py:
    • Use .only("pk") when iterating collection versions and their repositories in CollectionViewSet.destroy()
    • Batch AnsibleRepository lookup with filter(pk__in=...) to prevent N+1 queries
    • Use .only("namespace", "name", "version") when loading collection dependents
  • pulp_ansible/app/tasks/deletion.py:
    • Use .only("pk") when loading collection versions and iterating their repositories in delete_collection() task

By loading only the required fields, we avoid pulling large JSON blobs (docs_blob, metadata, dependencies, etc.) into memory when they're not needed for the deletion operation.

Related Work

@Funi1234
Copy link
Copy Markdown
Author

Funi1234 commented Apr 2, 2026

This will need to be backported for 0.25 (AAP 2.4/2.5/2.6) and 0.28 (AAP upstream/2.7) please.

Copy link
Copy Markdown
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_collection_deferred_fields is a blunt instrument used to work around having to change multiple spots in our collection sync code (some of them being in pulpcore!) and as such we shouldn't overuse when trying to optimize pulp-ansible. For this case we can easily fix the issues by being smarter with the querysets we write.

@Funi1234 Funi1234 requested a review from gerrod3 April 4, 2026 21:43
@Funi1234 Funi1234 requested a review from gerrod3 April 7, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants